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Abstract. In recent work, we have proposed an approach to Test Data 
Generation (TDG) of imperative bytecode by partial evaluation (PE) 
of CLP which consists in two phases: (1) the bytecode program is first 
transformed into an equivalent CLP program by means of interpretive 
compilation by PE, (2) a second PE is performed in order to supervise 
the generation of test-cases by execution of the CLP decompiled pro- 
gram. The main advantages of TDG by PE include flexibility to handle 
new coverage criteria, the possibility to obtain test-case generators and 
its simplicity to be implemented. The approach in principle can be di- 
rectly applied for TDG of any imperative language. However, when one 
tries to apply it to a declarative language like Prolog, we have found as 
a main difficulty the generation of test-cases which cover the more com- 
plex control flow of Prolog. Essentially, the problem is that an intrinsic 
feature of PE is that it only computes non-failing derivations while in 
TDG for Prolog it is essential to generate test-cases associated to fail- 
ing computations. Basically, we propose to transform the original Prolog 
program into an equivalent Prolog program with explicit failure by par- 
tially evaluating a Prolog interpreter which captures failing derivations 
w.r.t. the input program. Another issue that we discuss in the paper is 
that, while in the case of bytecode the underlying constraint domain only 
manipulates integers, in Prolog it should properly handle the symbolic 
data manipulated by the program. The resulting scheme is of interest for 
bringing the advantages which are inherent in TDG by PE to the field 
of logic programming. 

1 Introduction 

Test data generation (TDG) aims at automatically generating test-cases for in- 
teresting test coverage criteria. The coverage criteria measure how well the pro- 
gram is exercised by a test suite. Examples of coverage criteria are: statement 
coverage which requires that each line of the code is executed; path coverage 
which requires that every possible trace through a given part of the code is exe- 
cuted; etc. There are a wide variety of approaches to TDG (see [22] for a survey). 
Our work focuses on glass-box testing, where test-cases are obtained from the 
concrete program in contrast to black-box testing, where they are deduced from 
a specification of the program. Also, our focus is on static testing, where we as- 
sume no knowledge about the input data, in contrast to dynamic approaches [6] 
which execute the program to be tested for concrete input values. 
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The standard approach to generating test-cases statically is to perform a 
symbolic execution of the program [18,14,11], where the contents of variables 
are expressions rather than concrete values. The symbolic execution produces a 
system of constraints consisting of the conditions to execute the different paths. 
This happens, for instance, in branching instructions, like if-then-else, vifhere we 
might want to generate test-cases for the two alternative branches and hence 
accumulate the conditions for each path as constraints. The symbolic execution 
approach is usually combined with the use of constraint solvers in order to; 
handle the constraints systems by solving the feasibility of paths and, afterwards, 
to instantiate the input variables. 

TDG for declarative languages has received comparatively less attention than 
for imperative languages. In general, declarative languages pose different prob- 
lems to testing related to their own execution models, like laziness in func- 
tional programming (FP) and failing derivations in constraint logic programming 
(CLP). The majority of existing tools for FP are based on black-box testing (see 
e.g. [4]). An exception is [7] where a glass-box testing approach is proposed to 
generate test-cases for Curry. In the case of CLP, test-cases are obtained for 
Prolog in [16,3,21]; and very recently for Mercury in [5]. Basically the test-cases 
are obtained by first computing constraints on the input arguments that corre- 
spond to execution paths of logic programs and then solving these constraints 
to obtain test inputs for such paths. 

In recent work [2], we have proposed to employ existing partial evaluation 
(PE) techniques developed for CLP in order to automatically generate test- case 
generators for glass-box testing of bytecode. PE [13] is an automatic program 
transformation technique which has been traditionally used to specialise pro- 
grams w.r.t. a known part of its input data and, as Futamura predicted, can 
also be used to compile programs in a (source) language to another (object) 
language (see [8]). The approach to TDG by PE of [2] consists of two indepen- 
dent CLP PE phases. (1) First, the bytecode is transformed into an equivalent 
(decompiled) CLP program by specialising a bytecode interpreter by means of 
existing PE techniques. (2) A second PE is performed in order to supervise the 
generation of test-cases by execution of the CLP decompiled program. Interest- 
ingly, it is possible to employ control strategies previously defined in the context 
of CLP PE in order to capture coverage criteria for glass-box testing of byte- 
code. A unique feature of this approach is that, this second PE phase allows 
generating not only test-cases but also test-case generators. Another important 
advantage is that, in contrast to previous work to TDG of bytecode, it does not 
require devising a dedicated symbolic virtual machine. 

In this work, we study the application of the above approach to TDG by 
means of PE to the Prolog language. Compared to TDG of an imperative 
language [2], dealing with Prolog brings in as the main difficulty to generate 
test-cases associated to failing computations. This happens because an intrin- 
sic feature of PE is that it only produces results associated to the non-failing 
derivations. While this is what we need for TDG of an imperative language (like 
bytecode above), we now want to capture non- failing derivations in Prolog and 
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PHASE I " PHAS^n 

Fig. 1. General scheme of TDG by Partial Evaluation of CLP 

still rely on a standard partial evaluator. Our proposal is to transform the orig- 
inal Prolog program into an equivalent Prolog program with explicit failure by 
partially evaluating a Prolog interpreter which captures failing derivations w.r.t. 
the input program. This transformation is done in the phase (1) above. As an- 
other difference, in the case of bytecode, the underlying constraint domain only 
manipulates integers. However, the above phase (2) should properly handle the 
data manipulated by the program in the case of Prolog. Compared to existing 
approaches to TDG of Prolog [3,16], our approach basically is of interest for 
bringing the advantages which are inherent in TDG by PE to the field of Prolog: 

(i) It is more powerful in that we can produce test-case generators which are 
CLP programs whose execution in CLP returns further test-cases on demand 
without the need to start the TDG process from scratch; 

(ii) It is more flexible^ as different coverage criteria can be easily incorporated 
to our framework just by adding the appropriate local control to the partial 
evaluator. 

(iii) It is simpler to implement compared to the development of a dedicated test- 
case generator, as long as a CLP partial evaluator is available. 

The rest of the paper is organized as follows. In the next section, we give 
some basics on PE of logic programs and describe in detail the approach to 
TDG by PE proposed in [2]. Sect. 3 discusses some fundamental issues like the 
Prolog control-flow and the notion of computation path. Then, Sect. 4 describes 
the program transformation to make failure explicit, Sect. 5 outlines existing 
methods to properly handle symbolic data during the TDG phase, and finally 
Sect. 6 concludes and discusses some ideas for future work. 

2 Basics of TDG by Partial Evaluation 

In this section we recall the basics of partial evaluation of logic programming 
and summarize the general approach of relying on partial evaluation of CLP for 
TDG of an imperative language, as proposed in [2]. 
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2.1 Partial Evaluation and its Application to Compilation 

We assume familiarity with basic notions of logic programming and partial eval- 
uation (see e.g. [9]). Partial evaluation is a semantics-based program transfor- 
mation technique which specialises a program w.r.t. given input data, hence, 
it is often called program specialisation. Essentially, partial evaluators are non- 
standard interpreters which evaluate goals as long as termination is guaranteed 
and specialisation is considered profitable. In logic programming, the underlying 
technique is to construct (possibly) incomplete SLD trees for the set of atoms to 
be specialised. In an incomplete tree, it is possible to choose not to further un- 
fold a goal. Therefore, the tree may contain three kinds of leaves: failure nodes, 
success nodes (which contain the empty goal), and non-empty goals which are 
not further unfolded. The latter are required in order to guarantee termination 
of the partial evaluation process, since the SLD being built may be infinite. Even 
if the SLD trees for fully instantiated initial atoms (as regards the input argu- 
ments) are finite, the SLD trees produced for partially instantiated initial atoms 
may be infinite. This is because the SLD for partially instantiated atoms can 
have (infinitely many) more branches than the actual SLD tree at run-time. 

The role of the local control is to determine how to construct the (incomplete) 
SLD trees. In particular, the unfolding rule decides, for each resolvent, whether 
to stop unfolding or to continue unfolding it and, if so, which atom to select from 
the resolvent. On the other hand, partial evaluators need to compute SLD-trees 
for a number of atoms in order to ensure that all atoms which appear in non- 
failing leaves of incomplete SLD trees are "covered" by the root of some tree 
(this is known as the closedness condition of partial evaluation [9]). The role 
of the global control is to ensure that we do not try to compute SLD trees for 
an infinite number of atoms. The usual way of achieving this is by applying an 
abstraction operator which performs "generalizations" on the atoms for which 
SLD trees arc to be built. The global control returns a set of atoms T. Finally, 
the partial evaluation can then be systematically extracted from the set T (see 
[9] for details). 

Traditionally, there have been two different approaches regarding the way in 
which control decisions are taken, on-line and off-line approaches. In online PE, 
all control decisions are dynamically taken during the specialisation phase. In 
offline PE, a set of previously computed annotations (often manually provided) 
gives information to the control operators to decide, 1) when to stop unfolding 
{memoise) in the local control, and 2) how to perform generalizations in the 
global control. 

The development of PE techniques has allowed the so-called "interpretative 
approach" to compilation which consists in specialising an interpreter w.r.t. a 
fixed object code. Interpretive compilation was proposed in Futamura's seminal 
work [8], whereby compilation of a program P written in a {source) programming 
language Lg into another {object) programming language Lq is achieved by par- 
tially evaluating an interpreter for Ls written in Lq w.r.t. P. The advantages of 
interpretive (dc-)compilation w.r.t. dedicated (de-)compilers are well-known and 
discussed in the PE literature (see, e.g., [1]). Very briefly, they include: flexibility^ 
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it is easier to modify the interpreter in order to tune the decompilation (e.g., 
observe new properties of interest); easier to trust, it is more difficult to prove 
that ad-hoc decompilers preserve the program semantics; easier to maintain, 
new changes in the language semantics can be easily reflected in the interpreter. 

2.2 A General Scheme to TDG of Imperative Languages by PE 

In recent work, we have proposed an approach to Test Data Generation (TDG) 
by PE of CLP [2] and used it for TDG of bytecode. The approach is generic 
in that the same techniques can be applied to TDG other both low and high- 
level imperative languages. In Figure 1 we overview the main two phases of 
this technique. In Phase I, the input program written in some (imperative) 
language L is compiled into an equivalent CLP program Pclp- This compilation 
can be achieved by means of an ad-hoc decompiler (e.g., an ad-hoc decompiler 
of bytecode to Prolog [17]) or, more interestingly, can be achieved automatically 
by relying on the first Futamura projection by means of PE for logic programs 
as explained above (e.g., [12,1,10]). 

Now, the aim of Phase II is to generate test-cases which traverse as many 
different execution paths of Pl as possible, according to a given coverage criteria. 
From this perspective, different test data will correspond to different execution 
paths. With this aim, rather than executing the program starting from different 
input values, the standard approach consists in performing symbolic execution 
such that a single symbolic run captures the behavior of (infinitely) many input 
values. The central idea in symbolic execution is to use constraint variables 
instead of actual input values and to capture the effects of computation using 
constraints. Hence, the compilation from L to CLP allows us to use the standard 
CLP execution mechanism to carry out this phase. In particular, by running the 
Pclp program without input values, each successful execution corresponds to a 
different computation path in P^. 

Rather than relying on the standard execution mechanism, we have proposed 
in [2] to use PE of CLP to carry out Phase II. Essentially, we can rely on a CLP 
partial evaluator which is able to solve the constraint system, in much the same 
way as a symbolic abstract machine would do. Note that performing symbolic 
execution for TDG consists in building a finite (possibly unfinished) evaluation 
tree by using a non-standard execution strategy which ensures both a certain 
coverage criterion and termination. This is exactly the problem that unfolding 
rules, used in partial evaluators of (C)LP, solve. In essence, partial evaluators 
are non-standard interpreters which receive a set of partially instantiated atoms 
and evaluate them as determined by the so-called unfolding rule. Thus, the 
role of the unfolding rule is to supervise the process of building finite (possibly 
unfinished) SLD trees for the atoms. This view of TDG as a PE problem has 
important advantages. First, we can directly apply existing, powerful, unfolding 
rules developed in the context of PE. Second, it is possible to explore additional 
abilities of partial evaluators in the context of TDG. In particular, the generation 
of a residual program from the evaluation tree returns a program which can be 
used as a test-case generator, i.e., a CLP program whose execution in CLP 
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returns further test-cases on demand without the need to start the TDG process 
from scratch. In the rest of the paper, we study the application of this general 
approach to TDG of Prolog programs. 

3 Computation Paths for Test Data Generation of Prolog 

As we have already mentioned, test data generation is about producing test- 
cases which traverse as many different execution paths as possible. From this 
perspective, different test data should correspond to different execution paths. 
Thus, a main concern is to specify the computation paths for which we will pro- 
duce test-cases. This requires first to determine the control flow of the considered 
language. In this section, we aim at defining the control flow of Prolog programs 
that we will use for TDG. Test data will be generated for the computation paths 
in the control flow. 

3.1 The Control Flow of Prolog 

As usual a Prolog program consists of a set of predicates, where each predicate 
is defined as a sequence of clauses of the form H :- Bi, . . . , B„i with to > 0. A 
predicate is univocally determined by its predicate signature p/n, being p the 
name of the predicate and n its arity. Throughout the rest of the paper we will 
consider Prolog programs with the following features: 

— Rules are normalized, i.e., arguments in the head of the rule are distinct 
variables. The corresponding bindings will appear explicitly in the body as 
unifications. 

— Atoms appearing in the bodies of rules can be: unifications (considered as 
builtins), calls to defined predicates, term checking builtins (==/2, \==/2, 
etc), and arithmetic builtins (is/2, </2, =</2, etc). Other typical Prolog 
builtins like fail/0, ! /O, if /3, etc, have been deliberately left out to simplify 
the presentation. 

— All predicates must be moded and well-typed. We will assume the existence 
of a " : - pred" declaration associated with each predicate specifying the 
type expected for each argument (see as example the declarations in Fig. 2). 
Note that this assumption is sensible in the context of TDG (as the aim is 
the automatic generation of test input). Also, it should not be a limitation 
as analyses that can automatically infer this information exist. 

The control flow in Prolog programs is significantly more complex than in tra- 
ditional imperative languages. The declarative semantics of Prolog implies some 
additional features like: 1) several forms of backtracking, induced by the failure 
of a sub-goal, or by non-deterministic predicates; or 2) forced control flow change 
by the predicate "cut". Traditionally, control-flow graphs (CFGs for short) are 
used to statically represent the control-flow of programs. Typically, in a CFG, 
nodes are blocks containing a set of sequential instructions, and edges represent 
the flows that the program can follow w.r.t. the semantics of the corresponding 



32 



Miguel Gomez-Zamalloa, Elvira Albert, and German Puebla 



:- pred foo/2 : num*var. 

foo(X,Z) :- X > 0, 

Z = pos. 

foo(X,Z) :- X = 0, 

Z = zero . 




:- pred sorted/1 : list (num) . 



sorted(L) 
sorted(L) 
sorted(L) 



L = [] . 
L = [_] . 
L = [X,Y|R] , 
X < Y, 

sorted ( [Y I R]) , 




no\2 ^^^^ Tfido 

i 



Fig. 2. Working example. Prolog code and CFGs. 



programming language. In the literature, CFGs for Prolog (and Mercury) have 
been used for the aim of TDG in [16,21] ([5] for Mercury). In particular, CFGs 
determine the computation paths for which test-cases will be produced. Our 
framework relies on the CFGs of [16,21] which are known as p-flowgraph^s.^ As 
will be explained later, there are some differences between these CFGs and the 
ones in [5] which lead to different test-cases. 

Figure 2 depicts the Prolog code together with the corresponding CFGs for 
predicates foo/2 and sorted/1. Predicate foo/2, given a number in its first 
argument, returns, in the second one, the value pos if the number is positive and 
zero if it is zero. If the number is negative, it just fails. Predicate sorted/1, 
given a list of numbers, checks whether the list is strictly sorted, in that case it 
succeeds, otherwise it fails. The CFGs contain the following nodes: 

— a non-terminal node associated to each atom in the body of each clause, 

— a set of terminal nodes "T^" representing the success of the i-th clause, and 

— the terminal node "F" to represent failure. 

As regards edges, in principle all non-terminal nodes have two output flows, 
corresponding to the cases where the builtin or predicate call succeeds or fails 
respectively. They are labeled as "yes" or "no" for builtins (including unifica- 
tions), and as "rs" (return-after-success) or "rf" [return- after- failure) for pred- 
icate calls. There is an exception in the case of unifications where one of the 
arguments is a variable, in which case the unification cannot fail. This can be 
known statically by using the mode information. See for example nodes "Z=pos" 

^ The difference with the CFGs in [16,21] is that they consider one additional node 
per clause to explicitly represent the unification of the head of the rule. This is not 
needed in our case since predicates are normalised. 
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and "Z=zero" in the f oo/2 CFG. Both "yes" and "rs" edges point to the node 
representing the next atom in the clause or to the corresponding "T^" node if 
the atom is the last one. Finally, each "T^" node has an output edge labeled 
as "redo" to represent the case in which the predicate is asked for more solu- 
tions. All "no", "rf" and "redo" edges point either to the node corresponding 
to the first previous non-deterministic call in the same clause, or the first node 
of the following clause, or the "F" node if no node meets the above conditions. 
See as an example the "rs" and "rf" edges from the non-terminal node for 
sorted ( [Y I R] ) . 

3.2 Generating Test Data for Computation Paths 

In order to define the computation paths determined by the CFGs, every edge 
in every CFG is labeled with a unique natural number. An special edge labeled 
with "0" and p/n represents the entry of predicate p/n. 

Definition 1 (Computation sub-path). Given the CFG for predicate P, a 
computation sub-path is a sequence of numeric labels (natural numbers) {h, . . . , In) 
s.t: 

— li corresponds to either an entry, an "rs", an "rf" or a "redo" edge, 

— In leads to a terminal node or to a predicate call, and 

— for all consecutive labels li,lj, there exists a node corresponding to a builtin 
in the CFG of P , for which li is an input flow and Ij is an output flow. 

Definition 2 (Computation path). Given the CFGs corresponding to the set 
of predicates defining a program, a computation path ( CP for short) for predicate 
p is a concatenation spi ■ ■ ■ spm (m > 1) of computation sub-paths such that: 

— First label in spi is either 0, in which case we say it is a full CP, or corre- 
sponds to a "redo" edge, in which case we say it is a partial CP (PGP for 
short). 

— Last label in spm leads to a terminal node in the CFG of p. If it is a T node 
the CP is said to be successful otherwise it is called failing. 

— For all spk whose last label leads to a node corresponding to a predicate call, 
cp = spk+i ■ ■ ■ spj, j > k is a CP for the called predicate, and: 

• if cp is successful then the first label in spj^i corresponds to an "rs" 
edge, 

• otherwise (cp is failing), it corresponds to an rf edge. 

— For all spk whose first label corresponds to a "redo" edge flowing from a "Ta " 
node in the CFG of predicate q, 3spj, j < k, whose first label corresponds 
either to an entry edge or to a "redo" edge flowing from "Th", b < a, of the 
CFG ofq. 

If a CP contains at least one label corresponding to a "redo " flow, then the CP 
is said to be an after-retry CP. The rest of the CPs are first-try CPs. 
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For example in foo/2, pi = (0, 1,2) and p2 = (0, 3,5,6) are first-try successful 
CPs; P3=(0,3, 7) is a first-try failing branch; p4=(0, 1,2) • (4,5,6) is an aftcr- 
rctry successful CP (although this one is unfeasible a,s X > and X — 
arc disjoint conditions), and p5=(0, 1,2) • (4,7) is an aftcr-retry failing branch. 
In sorted/1, p6=(0, 2, 5, 7, 10) • (0,2,4) • (11) is a first-try successful CP and 
P7=(0,2,5,7, 10) • (0,2,5,7,9) • (12) is a first-try failing CP. It is interesting 
to observe the correspondence betvifeen the CPs and the test data that make 
the program traverse them. In foo/2, pi is followed by goal foo(l,Z), p2 by 
goal foo(0,Z), p3 by foo(-l,Z), is an unfeasible path, and p^ is followed 
by foo(0,Z) when wc ask for more solutions. As regards sorted/1, pq is fol- 
lowed by the goal sorted ( [0,1]) and by sorted ( [0,1,0]). As we will see 
in Sect. 5, these will become part of the test-cases that we automatically infer. 

A key feature of our CFGs is that they make explicit the fact that after failing 
with a clause the computation has to re-try with the following clause, unless a 
non-deterministic call is left behind. E.g., in foo/2 the CFG makes explicit that 
the only way to get a first-try failing branch is through the CP (0,3, 7), hence 
traversing; and failing in, both conditions X > and X = 0. Therefore, a test 
data to obtain such a behavior will be a negative number for argument X. Other 
approaches, like the one in [5], do not handle flows after failure in the same way. 
In fact, in [5], edge "3" in foo/2 goes directly to node "F". It is not clear if 
these approaches are able to obtain such a test data. As another difference with 
previous approaches to TDG of Prolog, we want to highlight that we use CFGs 
just to reason about the program transformation that will be presented in the 
following section and, in particular, to clarify which features we want to capture. 
However, in previous approaches, test-cases are deduced directly from the CFGs. 

4 A Program Transformation to Make Failure Explicit 

As we outlined in Sect. 1, an intrinsic feature of the second phase of our approach 
is that it can only produce results associated to non-failing derivations. This is 
the main reason why the general approach to TDG by PE sketched in Sect. 2 is 
directly applicable only to TDG of imperative languages. To enable its applica- 
tion to Prolog, we propose a program transformation which makes failure explicit 
in the Prolog program. The specialisation of meta-programs has been proved to 
have a large number of interesting applications [9]. Futamura projection's to de- 
rive compiled code, compilers and compiler generators fall into this category. The 
specialization of meta-interpreters for non-standard computation rules has also 
been studied. Furthermore, language extensions and enhancements can be easily 
expressed as meta-interpreters which perform additional operations to the stan- 
dard computation. In short, program specialisation offers a general compilation 
technique for the wide variety of procedural interpretations of logic programs. 
Among them, we propose to carry out our transformation which makes failure in 
logic programs explicit by partially evaluating a Prolog meta-interpreter which 
captures failing derivations w.r.t. the original program. First, in Sect. 4.1 we 
describe such a meta-intcrprctcr emphasizing the Prolog control features which 
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we want to capture. Then, Sect. 4.2 describes the control strategies which have 
to be used in PE in order to produce an effective transformation. 

4.1 A Prolog Meta-Interpreter to Capture Failure 

Given a Prolog program and given a goal, our aim is to define an interpreter in 
which the computation of the program and goal produces the same results as the 
ones obtained by using the standard Prolog computation but with the difference 
that failure is never reported, fnstead, an additional argument Answer will be 
bound to the value "yes" , if the computation corresponds to a successful deriva- 
tion, and to "no" if it corresponds to a failing derivation. Predicate solve/4 is 
the main predicate of our meta-interpreter whose first and second arguments are 
the predicate signature and arguments of the goal to be executed; and its third 
argument is the answer; by now we ignore the last argument. For instance, the 
call solveCf oo/2, [0,Z] .Answer ,_) succeeds with Z = zero and Answer = yes, 
and solve (foo/2, [-1,Z] , Answer,.) also succeeds, but with Answer = no. The 
interpreter has to handle the following issues: 

1. The Prolog backtracking mechanism has to be explicitly implemented. To 
this aim, a stack of choice points is carried along during the computation so 
that: 

— if the derivation fails: (1) when the stack is empty, it ends up with success 
and returns the value "no" , (2) otherwise, the computation is resumed 
from the last choice point, if any; 

— if it successfully ends: (1) when the stack is empty, the computation 
finishes with answer "yes" , (2) otherwise, the computation is resumed 
from the last choice point. 

2. When backtracking occurs, all variable bindings, between the current point 
and the choice point to resume from, have to be undone. 

3. The interpreter has to be implemented in a big-step fashion. This is a re- 
quirement for obtaining an effective decompilation. More details are given 
in Sect. 4.2. 

Figure 3 shows an implementation of a meta-interpreter which handles the 
above issues. The fourth argument of the main predicate solve/4, named TNCPS, 
contains upon success the total number of choice points not yet considered, 
whose role will be explained later. The interpreter assumes that the program 
is represented as a set of pred/2 and clause/3 facts. There is a pred/2 fact 
per predicate providing its predicate signature, number of clauses and mode 
information; and a clause/3 fact per clause providing the actual code and clause 
identifier. Predicate solve/4 basically builds an initial state on SO, by calling 
build_sO/4, and then delegates on exec/3 to obtain the final state Sf of the 
computation. The output information, OutVs, is taken from Si. The state carried 
along is of the form st(PP,G,CPs,OutVs,Ans,NCPs), where PP is the current 
program point, G the current goal, CPs is the stack of choice points (list of 
program points), OutVs the list of variables in G corresponding to the output 
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solve (P/Ar,Args, Answer, TNCPs) :- 
pred(P/Ar,_) , 

build_sO (P/Ar , Args , SO , DutVs) , 
exec(Args,SO,Sl) , 

Sf = stC_,_,_,OutVs' .Answer, TNCPs/_) , 
QutVs' = Out Vs. 

exec(_,S,Sf) :- 

S = st(_, [] , [] ,OutVs,yes,NCPs) , 
Sf = st(_,_,_,OutVs,yes,KCPs) . 

exec(_,S,Sf) :- 

S = St (_,[],[_ I _] ,OutVs, yes, NCPs) , 
Sf = st{_, _,_,OutVs, yes, NCPs) . 

exec(_,S,Sf) :- 

S = St (_,_,[] ,OutVs, no, TKCPs/0) , 
Sf = st(_,_,.,OutVs,no,TKCPs/0) . 

exec CArgs , S , Sf ) :- 

S = st(_, [] , [CPICPs] ,_,yes,TKCPs/0) , 
bulld_retry_state (Args , CP , CPs , TNCPs , S ' ) , 
exec (Args, S' ,Sf ) . 

exec (Args ,S ,Sf) :- 

S = St (_,_, [CPICPs] ,_, no, TNCPs/0) , 
bulld_retry_state (Args , CP , CPs , TNCPs , S ' ) , 
exec(Args,S' ,Sf ) . 



exec (Args, S,Sf) :- 

S = st{PP, [AlAs] ,CPs,OutVs,yes,TNCPs/ENCPs) , 

PP = pp(P/Ar,ClId,Pt) , 

internal (A) , 

functor (A , A_f , A_ar) , 

A =. . [A_f lA.args] , 

next(Pt,Pt') , 

solve (A_f /A_ar , A_args , Ans , ENCPs ' ) , 
TNCPs' is TNCPs + ENCPs', 
ENCPs" is ENCPs + ENCPs', 
PP' = pp(P/Ar,ClId,Pt') , 

S' = St (PP ', As, CPs, OutVs, Ans, TNCPs' /ENCPs") , 
exec (Args , S ' , Sf ) . 
exec (Args, S,Sf) :- 

S = st{PP, [AlAs] , CPs, OutVs, yes, NCPs) , 

PP = pp(P/Ar,Clld,Pt) , 

builtin(A) , 

next(Pt,Pt') , 

run_builtin(PP,A,Ans) , 

PP' = pp(P/Ar,ClId,Pt') , 

S' = St (PP ', As, CPs, OutVs, Ans, NCPs) , 

exec (Args , S ' , Sf ) . 



Fig. 3. Code of Prolog mcta-interpreter to capture failure 

parameters of the original goal, Ans the current answer ( "yes" or "no" ) and 
NCPs the number of choice points left behind. A program point is of the form 
pp(P/Ar ,ClId.,Pt) , where P/Ar, Clld and Pt are the predicate signature, the 
clause identifier and the program point of the clause at hand. Predicate exec/3 
implements the main loop of the interpreter. Given the current state in its second 
argument it produces the final state of the computation in the third one. It is 
defined by the seven clauses which are applied in they following situations: 

l^*c/. The current goal is empty, the answer "yes" and there are no pending 
choice points. Then, the computation finishes with answer "yes" . The current 
answer is actually used as a flag to indicate whether the previous step in the 
computation succeeded or failed (see the last two exec/3 clauses). 

2"'^cL As V^cl. but having at least one choice point. This clause represents the 
solution in which the computation ends. The 4*'' clause takes the other al- 
ternatives. 

3^'^cl. The previous step failed and there are no pending choice points. Then, the 

computation ends with answer "no" . 
A^^cl. The current goal is empty, the answer "yes" and there is at least one 

pending choice point. This is the same situation as in the 2"'^ clause, however 

in this case the alternative of resuming from the last choice point is taken. 

The corresponding state S' is built by means of buildjretry_state/5 and 

the computation is resumed from S' by recursively calling exec/3. 
5*^cl. The previous step failed and there is at least one pending choice point. 

Then, the computation is resumed from the last choice point in the same 

way as in the previous clause. 
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6*^cl. The first atom to be solved is user-defined. A call to solve/4 handles the 
atom, and the computation proceeds with the next program point of the 
same clause which was the current one before calling solve/4. This way of 
solving a predicate call makes the interpreter big-step (issue (3) above). 

7*^cl. The first atom to be solved is a builtin. Then, run_builtin/3 produces 
the corresponding answer, and the computation proceeds with the following 
program point. An interesting observation (also applicable for the previous 
clause) is that the answer obtained from run_builtin/3 (or solve/4) is now 
set up as the answer of the next state. This will make the computation go 
through the 3'"'' or 5*'* clauses in the following step, if the obtained answer 
was "no". 

The correspondence between these clauses and the flows in the CFGs is as fol- 
lows: clauses 1***, 2"^^ and 4*'' represent the output edges from every "T" node. 
Clause 3'''^ represents the "no" edges to "F" nodes and 5*'' the "no" edges to non- 
terminal nodes. Finally clauses 6*'' and 7*'' represents the execution of builtins 
and predicate calls in non-terminal nodes and their corresponding "yes" edges. 

Let us now explain how the interpreter handles the above three issues. To 
handle (1), a stack of choice points is carried along within the state, initialised 
to contain all initial program points of each clause defining the predicate to 
be solved, except for the first one. E.g., the initial stack of choice points for 
sorted/1 is [pp(sorted/l ,2 , 1) ,pp(sorted/l ,3 , 1)] . How this stack is used 
to perform the backtracking is already explained in the description of the 4*^* and 
S*'' exec/3 clauses above. As regards issue (2), a quite simple way to implement 
this in Prolog is to produce the necessary fresh variables every time the compu- 
tation is resumed. This is done inside buildjretry_state/5. The corresponding 
unification to link the fresh variables with the original goal variables is made at 
the end (see last line of solve/4). This is the reason why 1) the list of the actual 
variables used in the current goal needs to be carried along within the state; and 
2) the original arguments are carried along as the first argument of exec/3, as 
the original ground arguments provided, have to be used when resuming from a 
choice point. 

Finally, it is worth mentioning that solve/4 does not return the actual stack 
of choice points but only the number of them. This means that during a compu- 
tation the interpreter only considers choice points of the predicate being solved. 
The question is then, how can the interpreter backtrack to the last choice point, 
including those induced by other computations of solve/4? E.g., how can the 
interpreter follow edge "13" in the CFG of sorted/1? The interpreter performs 
the backtracking in the following way: 1) The total number of choice points 
left behind, TNCPs, is carried along within the state and finally returned in the 
last argument of solve/4. 2) The number of choice points corresponding to in- 
voked predicates, ENCPs, is also carried along. It is updated right after the call 
to solve/4 in the 6*'' clause of exec/3. Both numbers are stored in the last 
argument of the state as TNCPs/ENCPs. 3) Execution is resumed from choice 
points of the current predicate only if ENCPs = 0, as it can be seen in the 4*'' 
and 5*'' clauses. Otherwise, the computation just fails and Prolog's backtracking 
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mechanism is used to ask the last invoked predicate for more solutions. This 
indeed means that the non-determinism of the program is still implicit. 

4.2 Controlling Partial Evaluation 

The specialisation of interpreters has been studied in many different contexts, 
see e.g. [9,10,19]. Very recently, [10] proposed control strategies to successfully 
specialise low-level code interpreters w.r.t. non trivial programs. Here we demon- 
strate how such guidelines can be, and should be, used in the specialisation of 
non-trivial Prolog meta-interpreters. They include: 

1. Big-step interpreter. This solves the problem of handling recursion (see [10]) 
and enables a compositional specialisation w.r.t. the program procedures 
(or predicates). Note that an effective treatment of recursion is specially 
important in Prolog programs where recursion is heavily used. 

2. Optimality issues. Optimality must ensure that: a) the code to be trans- 
formed is traversed exactly once, and b) residual code is emitted once in 
the transformed program. To achieve optimality, during unfolding, all atoms 
corresponding with divergence or convergence points in the CFG of the pro- 
gram to be transformed, has to be memoised (see Sect. 2.1). A divergence 
(convergence) point is a program point from (to) which two or more flows 
originate (converge). 

We already explained that the interpreter in Fig. 3 is big-step. As regards opti- 
mality, by looking at the CFGs of Fig. 2, we can observe: 1) all program points 
are divergence points except those corresponding with unifications in which one 
argument is a variable, and 2) the first program point of every clause, except for 
the one of the first clause, is a convergence point. We assume that conv_points(P) 
and div_points(P) denote, respectively, the set of convergence points and diver- 
gence points of a predicate P. We follow the syntax of [10] for PE annotations. 
An annotation is of the form ^\Precond\ Ann Pred" where Precond is an 
optional precondition defined as a logic formula, Ann is the kind of annotation 
(only memo in this case), and Pred is a predicate descriptor, i.e., a predicate 
function and distinct free variables. Then, to achieve an effective transformation, 
we specialise the interpreter in Fig. 3 w.r.t. the program to be transformed by 
using the following annotation for each predicate P/Ar in the program: 

PP G div_points(P/Ar) U conv_points(P/Ar) => memo exec(_, st(PP, _, _, _, _, _), _) 

Additionally solve/4 and run_builtin/3 are also annotated to be memoised 
always to avoid code duplications. 

This already describes how the specialisation has to be steered in the local 
control. As regards the global control, the only predicate which can introduce 
non-termination is exec/3. Its first and third arguments contain a fixed structure 
with variables. The second one might be problematic as it ranges over the set of 
all computable states at specialisation time. Note that the number of computable 
states remains finite thanks to the big-step nature of the interpreter. Still, it can 
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solve(foo/2, [C,D] ,A,B) :- 
run_builtin_l (E,C) , 
exec_l(C,E,F,A,B) , F= [D] . 

exec_l(A,no,F,G,H) :- exec_2(A,F,G,H) . 
exec_l (_ ,yes , [pos] ,yes, 1) . 
exec_l(A,yes,F,G,H) :- exec_2(A,F,G,H) 

exec_2(A,G,H,I) :- 

run_builtin_2(K,A) , exec_3(K,G,H, I) 



exec_3(no, [_] ,no,0) . 
exec_3(yes, [zero] ,yes,0) . 

run_builtin_l(yes,A) :- A#>0 . 
run_builtin_l(no,A) :- \+ A#>0. 

run_builtin_2(yes,A) :- A#=0. 
run_builtin_2(no,A) :- \+ A#=0. 



Fig. 4. Transformed code with explicit failure for foo/2 



happen that the same program point is reached with different values for the NCPs 
sub-term of the state. Therefore, if one wants to achieve the optimality criterion 
above, such argument has to be always generalised in global control. 

Example 1. Figure 4 depicts the transformed code we obtain for predicate foo/2. 
It can be observed that there is a clear correspondence between the trans- 
formed code and the CFG in Fig. 2. Thus, predicate solve/4 represents the 
node "X>0", exec_l/5 implements its continuation, whose three clauses cor- 
respond to the three sub-paths (3), (1,2) and (1,2,4) respectively. Predicate 
exec_2/4 represents the node "X=0" and exec_3/5 implements its continuation, 
whose two clauses correspond to the sub-paths (7) and (5,6). Note that edge 
"8" is not considered in the meta-interprcter (nor in the transformed program) 
as it is meaningless for TDG. It is worth mentioning that the transformed pro- 
gram captures the way in which variable bindings are undone. For instance in 
solve(foo/2, [C,D] ,...), if we keep track of variables C and D, it can be seen 
that D, which corresponds to variable Z in the original code, is only used for 
the final unification F= [D] , while new fresh variables are used for the unifica- 
tions with pos and zero. However, variable C, which corresponds to variable 
X in the original code, is actually used for the checks in run_builtin_l/2 and 
run_builtin_2/2. This turns out to be fundamental when trying to obtain test 
data associated to the first-try failing CP (0, 3, 7). It must be the same variable 
the one which, at the same time, is not "> 0" and not "=0". Otherwise we 
cannot obtain a negative number as test data for such CP. Finally, observe that 
the original Prolog arithmetic builtins have been (automatically) transformed 
into their clpfd counterparts 



5 Generating Test Cases by Partial Evaluation 

Once the original Prolog program has been transformed into an equivalent Prolog 
program with explicit failure, we can use the approach of [2] to carry out phase 

* We are using the clpfd library of Sicstus Prolog. See [20] for details. 
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II (see Fig. 1) and generate test data both for successful and failing derivations. 
As we have explained in Sect. 2.2. the idea is to perform a second PE over the 
CLP transformed program where the unfolding rule plays the role of the cover- 
age criterion. In [2] an unfolding rule implementing the block- count (k) coverage 
criterion was proposed. A set of computation paths satisfies the block- count{k) 
criterion if it includes all terminating computation paths which can be built in 
which the number of times each block is visited does not exceed the given k. 
The blocks the criterion refers to are the blocks or nodes in the CFGs of the 
original Prolog program. As the only form of loops in Prolog are recursive calls, 
the "fc" in the block- count{k) actually corresponds to the number of recursive 
calls which are allowed. 

Unfortunately, the presence of Prolog's negation in our transformed programs 
complicates this phase. The negation will appear in the transformed program 
for "no" branches originating from nodes corresponding to a (possibly) failing 
builtin. See for example predicates run_builtin_l/3 and run_builtin_2/3 in 
the transformed code of foo/2 in Fig. 4. While Prolog's negation works well 
for ground arguments, it gives no information for free variables, as it is re- 
quired in the evaluation performed during this TDG phase. In particular, in the 
foo/2 example, given the computation which traverses the calls "\+ A#>0" and 
"\+ A#=0" (corresponding to the path (0,3,7) in the CFG), we need to infer 
that "A<0". In other words, we need somehow to turn the negative information 
into positive information. This transformation is straightforward for arithmetic 
builtins: we just have to replace "\+ ei#=e2" by "ei#\=e2" and "\+ ei#>e2" by 
"ei#=<e2", etc. 

Example 2. This transformation allows us to obtain the following set of test- 
cases for foo/2: 

f ([1] , [pes] , yes/first -try), ( [1] , [J ,no/af ter-retry), 1 
\ ([0] , [zero] , yes/first-try), ([-100] , [J , no/first-retry) J 

They correspond respectively (reading by rows) to the CPs (0,1,2), (0,1,2) • 
(4, 7), (0, 3, 5, 6) and (0, 3, 7). Each test-case is represented as a 3-tuple {Ins, Outs, Ans) 
being Ins the list of input arguments, Outs the list of output arguments and 
Ans the answer. The answer takes the form A/B with A g {yes, no} and 
B € {first-try, after-retry}^, so that we obtain sufficient information about 
the kind of CP to which the test-case corresponds (see Sect. 3). As there are no 
recursive calls in foo/2 such test-cases are obtained using the block- count(k) 
criterion for any k (greater than 0). The domain used for the integer number is 
{-100. .100}. 

However, it can be the case that negation involves unifications with symbolic 
data. For example, the transformed code for sorted/1 includes the negations 
"\+ L=[]" and "\+ L=[_|_]". As before, we might write transformations for 
the negated unifications involving lists, so that at the end it is inferred that " 

^ To simplify the presentation in Sect. 4.1, we decided not include in the interpreter 
the support to calculate the first-try/af ter-retry value. 
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L= [_, _|_] " . However this would be too an ad- hoc solution as many distinct term 
structures, different from lists, can appear on negated unifications. A solution for 
this problem has been recently proposed for Mercury in the same context [5] . It 
roughly consists in the following: 1) It is assumed that each predicate argument 
is well-typed. 2) A domain is initialised for each variable, containing the set of 
possible functors the variable can take. 3) When a negated unification involving 
an output variable is found (in their terminology a negated decomposition) , the 
corresponding functor is removed from the variable domain. It is crucial at this 
point the assumption that complex unifications are broken down into simple 
ones. 4) Finally, a search algorithm is described to generate particular values 
from the type definition and final domain for the variable. The technique is 
implemented using CHR and can be directly used in principle for our purposes 
as well. 

On the other hand, advanced declarative languages like TOY [15] make 
possible the co-existence of different constraint domains. In particular, the co- 
existence of boolean and numeric constraint domains enables the possibility of 
using disequalities involving both symbolic data and numbers. This allows for 
example expressing the negated unifications "\+ L= [] " and "\+ L= [_|_] " as dis- 
equality constraints " L/=[]" and " L/=[_|_]". Additionally, by relying on the 
boolean constraint solver, the negated arithmetic builtins "\+ A#>0" and "\+ 
A#=0" can be encoded as "(A#>0) == false" and "(A#=0) == false". This is 
in principle a more general solution that we want to explore, although a thorough 
experimental evaluation needs to be carried out to demonstrate its applicability 
to our particular context. 

Example 3. Now, by using any of the techniques outlined above, we obtain the 
following set of test-cases for sorted/1, using block- count{2) as the coverage 
criterion: 

([[]],[], yes/first-try), ( [ [0] ] , [] ,yes/f irst-try), 

([[0,1]] ,[], yes/first-try), ([ [0 , 1 , 2] ],[], yes/first-try), I 
([[0,1,2,0|J] , [] , no/first-try), ([[0,1, 0|J] , [] , no/first-try), f 
([[0,0|_]] ,[], no/first-try) J 

They correspond respectively (reading by rows) to the CPs "(0, 1)", "(0,2,4)", 
"(0, 2, 5, 7, 10) • (0, 2, 4) • (11)" , "(0, 2, 5, 7, 10) • (0, 2, 5, 7, 10) • (0, 2, 4) • (11) • (11)" , 
"(0,2, 5,7, 10)-(0,2, 5,7, 10)-(0,2, 5,7, 9)-(12).(12)", "(0,2, 5,7, 10)-(0,2, 5,7,9)- 
(12)" , "(0, 2, 5, 7, 9)" . They are indeed aU the paths that can be followed with no 
more than 3 recursive calls. This time the domain has been set up to {0..100}. 

6 Conclusions and Ongoing work 

Very recently, we proposed in [2] a generic approach to TDG by PE which in 
principle can be used for any imperative language. However, applying this ap- 
proach to TDG of a declarative language like Prolog introduces some difficulties 
like the handling of failing derivations and of symbolic data. In this work, we 
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have sketched solutions to overcome such difBcultics. In particular, wc have pro- 
posed a program transformation, based on PE, to make failure explicit in the 
Prolog programs. To handle Prolog's negation in the transformed programs, we 
have outlined existing solutions that make it possible to turn the negative infor- 
mation into positive information. Though our preliminary experiments already 
suggest that the approach can be very useful to generate test-cases for Prolog, 
we plan to carry out a thorough practical assessment. This requires to cover 
additional Prolog features like the module system, builtins like cut/0, fail/0, 
if /3, etc. and also to compare the results with other TDG systems. We also 
want to study the integration of other kinds of coverage criteria like data-flow 
based criteria. Finally, we would like to explore the use of static analyses in the 
context of TDG. For instance, the information inferred by a failure analysis can 
be very useful to prune some of the branches that our transformed programs 
have to consider. 
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